AITopics | cluster label

Collaborating Authors

cluster label

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Explainable cluster analysis: a bagging approach

Quetti, Federico Maria, Ballante, Elena, Figini, Silvia, Giudici, Paolo

arXiv.org Machine LearningMar-23-2026

A major limitation of clustering approaches is their lack of explainability: methods rarely provide insight into which features drive the grouping of similar observations. To address this limitation, we propose an ensemble-based clustering framework that integrates bagging and feature dropout to generate feature importance scores, in analogy with feature importance mechanisms in supervised random forests. By leveraging multiple bootstrap resampling schemes and aggregating the resulting partitions, the method improves stability and robustness of the cluster definition, particularly in small-sample or noisy settings. Feature importance is assessed through an information-theoretic approach: at each step, the mutual information between each feature and the estimated cluster labels is computed and weighted by a measure of clustering validity to emphasize well-formed partitions, before being aggregated into a final score. The method outputs both a consensus partition and a corresponding measure of feature importance, enabling a unified interpretation of clustering structure and variable relevance. Its effectiveness is demonstrated on multiple simulated and real-world datasets.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Machine Learning

2603.1984

Country:

Europe > Italy (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > United States > Wisconsin (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)

Add feedback

427e3427c5f38a41bb9cb26525b22fba-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 09:35:58 GMT

circle-torus model, inequality, oronoi cell, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.29)
Europe > Austria > Vienna (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Wittgenstein's Family Resemblance Clustering Algorithm

Amanpour, Golbahar, Ghojogh, Benyamin

arXiv.org Machine LearningJan-8-2026

This paper, introducing a novel method in philo-matics, draws on Wittgenstein's concept of family resemblance from analytic philosophy to develop a clustering algorithm for machine learning. According to Wittgenstein's Philosophical Investigations (1953), family resemblance holds that members of a concept or category are connected by overlapping similarities rather than a single defining property. Consequently, a family of entities forms a chain of items sharing overlapping traits. This philosophical idea naturally lends itself to a graph-based approach in machine learning. Accordingly, we propose the Wittgenstein's Family Resemblance (WFR) clustering algorithm and its kernel variant, kernel WFR. This algorithm computes resemblance scores between neighboring data instances, and after thresholding these scores, a resemblance graph is constructed. The connected components of this graph define the resulting clusters. Simulations on benchmark datasets demonstrate that WFR is an effective nonlinear clustering algorithm that does not require prior knowledge of the number of clusters or assumptions about their shapes.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2601.01127

Country: North America > Canada (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

In-Context Clustering with Large Language Models

Wang, Ying, Ren, Mengye, Wilson, Andrew Gordon

arXiv.org Artificial IntelligenceOct-10-2025

We propose In-Context Clustering (ICC), a flexible LLM-based procedure for clustering data from diverse distributions. Unlike traditional clustering algorithms constrained by predefined similarity measures, ICC flexibly captures complex relationships among inputs through an attention mechanism. We show that pretrained LLMs exhibit impressive zero-shot clustering capabilities on text-encoded numeric data, with attention matrices showing salient cluster patterns. Spectral clustering using attention matrices offers surprisingly competitive performance. We further enhance the clustering capabilities of LLMs on numeric and image data through fine-tuning using the Next Token Prediction (NTP) loss. Moreover, the flexibility of LLM prompting enables text-conditioned image clustering, a capability that classical clustering methods lack. Our work extends in-context learning to an unsupervised setting, showcasing the effectiveness and flexibility of LLMs for clustering. Our code is available at https://agenticlearning.ai/icc.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.08466

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

c132c02176577c4319a878f6417a331a-Paper-Conference.pdf

Neural Information Processing SystemsAug-18-2025, 14:25:05 GMT

anchor client, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.68)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science (0.69)
Information Technology > Communications (0.68)

Add feedback

81b8390039b7302c909cb769f8b6cd93-Paper-Conference.pdf

Neural Information Processing SystemsAug-16-2025, 12:26:38 GMT

adversarial sample, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Yolo County > Davis (0.14)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

A Conditional GAN for Tabular Data Generation with Probabilistic Sampling of Latent Subspaces

Akritidis, Leonidas, Bozanis, Panayiotis

arXiv.org Artificial IntelligenceAug-4-2025

The tabular form constitutes the standard way of representing data in relational database systems and spreadsheets. But, similarly to other forms, tabular data suffers from class imbalance, a problem that causes serious performance degradation in a wide variety of machine learning tasks. One of the most effective solutions dictates the usage of Generative Adversarial Networks (GANs) in order to synthesize artificial data instances for the under-represented classes. Despite their good performance, none of the proposed GAN models takes into account the vector subspaces of the input samples in the real data space, leading to data generation in arbitrary locations. Moreover, the class labels are treated in the same manner as the other categorical variables during training, so conditional sampling by class is rendered less effective. To overcome these problems, this study presents ctdGAN, a conditional GAN for alleviating class imbalance in tabular datasets. Initially, ctdGAN executes a space partitioning step to assign cluster labels to the input samples. Subsequently, it utilizes these labels to synthesize samples via a novel probabilistic sampling strategy and a new loss function that penalizes both cluster and class mis-predictions. In this way, ctdGAN is trained to generate samples in subspaces that resemble those of the original data distribution. We also introduce several other improvements, including a simple, yet effective cluster-wise scaling technique that captures multiple feature modes without affecting data dimensionality. The exhaustive evaluation of ctdGAN with 14 imbalanced datasets demonstrated its superiority in generating high fidelity samples and improving classification accuracy.

artificial intelligence, ctdgan, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2508.00472

Country: Europe > Greece (0.28)

Genre: Research Report > Experimental Study (0.67)

Industry: Health & Medicine > Therapeutic Area (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Explaining Recovery Trajectories of Older Adults Post Lower-Limb Fracture Using Modality-wise Multiview Clustering and Large Language Models

Khan, Shehroz S., Abedi, Ali, Chu, Charlene H.

arXiv.org Artificial IntelligenceJun-17-2025

Interpreting large volumes of high-dimensional, unlabeled data in a manner that is comprehensible to humans remains a significant challenge across various domains. In unsupervised healthcare data analysis, interpreting clustered data can offer meaningful insights into patients' health outcomes, which hold direct implications for healthcare providers. This paper addresses the problem of interpreting clustered sensor data collected from older adult patients recovering from lower-limb fractures in the community. A total of 560 days of multimodal sensor data, including acceleration, step count, ambient motion, GPS location, heart rate, and sleep, alongside clinical scores, were remotely collected from patients at home. Clustering was first carried out separately for each data modality to assess the impact of feature sets extracted from each modality on patients' recovery trajectories. Then, using context-aware prompting, a large language model was employed to infer meaningful cluster labels for the clusters derived from each modality. The quality of these clusters and their corresponding labels was validated through rigorous statistical testing and visualization against clinical scores collected alongside the multimodal sensor data. The results demonstrated the statistical significance of most modality-specific cluster labels generated by the large language model with respect to clinical scores, confirming the efficacy of the proposed method for interpreting sensor data in an unsupervised manner. This unsupervised data analysis approach, relying solely on sensor data, enables clinicians to identify at-risk patients and take timely measures to improve health outcomes.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2506.12156

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Lightweight Trustworthy Distributed Clustering

Li, Hongyang, Wu, Caesar, Chadli, Mohammed, Mammar, Said, Bouvry, Pascal

arXiv.org Artificial IntelligenceApr-15-2025

Ensuring data trustworthiness within individual edge node s while facilitating collaborative data processing poses a critical challenge in edge computing system s (ECS), particularly in resource-constrained scenarios such as autonomous systems sensor networks, indu strial IoT, and smart cities. This paper presents a lightweight, fully distributed k -means clustering algorithm specifically adapted for edge e nvi-ronments, leveraging a distributed averaging approach wit h additive secret sharing, a secure multiparty computation technique, during the cluster center update ph ase to ensure the accuracy and trustworthiness of data across nodes. Edge computing, a paradigm emerging from distributed compu ting, emphasizes processing data at or near its source to minimize latency and reduce band width consumption [1]-[3]. The rapid advancements in edge computing technologies, includ ing algorithms for decentralized and efficient data processing, have significantly accelerated t he deployment of distributed sensor networks. Two key properties of ECS are crucial in large-scale deploym ents.

artificial intelligence, machine learning, node, (18 more...)

arXiv.org Artificial Intelligence

2504.10109

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Networks > Sensor Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.57)

Add feedback